280 research outputs found
Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks
When constructing models that learn from noisy labels produced by multiple
annotators, it is important to accurately estimate the reliability of
annotators. Annotators may provide labels of inconsistent quality due to their
varying expertise and reliability in a domain. Previous studies have mostly
focused on estimating each annotator's overall reliability on the entire
annotation task. However, in practice, the reliability of an annotator may
depend on each specific instance. Only a limited number of studies have
investigated modelling per-instance reliability and these only considered
binary labels. In this paper, we propose an unsupervised model which can handle
both binary and multi-class labels. It can automatically estimate the
per-instance reliability of each annotator and the correct label for each
instance. We specify our model as a probabilistic model which incorporates
neural networks to model the dependency between latent variables and instances.
For evaluation, the proposed method is applied to both synthetic and real data,
including two labelling tasks: text classification and textual entailment.
Experimental results demonstrate our novel method can not only accurately
estimate the reliability of annotators across different instances, but also
achieve superior performance in predicting the correct labels and detecting the
least reliable annotators compared to state-of-the-art baselines.Comment: 9 pages, 1 figures, 10 tables, 2019 Annual Conference of the North
American Chapter of the Association for Computational Linguistics (NAACL2019
Thermoelectric Properties of Ternary Tellurides and Quaternary Derivative of Tl9BiTe6
Abstract
The main focus of this work was on exploratory preparation of thermoelectric materials and analyses of their physical properties. A thermoelectric material is capable of converting heat to electricity or vice versa. Usually, narrow band gap semiconductors are good candidates for thermoelectric applications, because such materials have large Seebeck coefficient, reasonably high electrical conductivity and low thermal conductivity. In this work, two different systems were studied, ternary layered tellurides and quaternary derivatives of Tl9BiTe6. I tried to prepare Pb1−xBi2+xTe4 with x = 0.30, 0.10, −0.10 and = 0.30 and Pb1−xBi4+xTe7 with x = 0.15, 0.00, −0.15 and −0.35, and two pure compounds, Pb0.8Bi2.2Te4 and Pb0.9Bi2.1Te4 were obtained. Powder X-ray diffraction was used to confirm the purity of the compounds, and physical properties were measured on cold-pressed samples with densities around 80% of the theoretical value. The figure of merit of the ternary tellurides is comparable to the published values of PbBi2Te4 (0.5 at 600 K). I also investigated the quaternary series Tl8.67PbxBi1.33−xTe6 with x between 0.50 and 1.00. The purity was confirmed by powder X-ray diffraction data, and physical properties were measured on Spark Plasma Sintered (SPS) samples. Low thermal conductivity was observed as well as competitive power factors. The highest ZT value was 0.57 for the compound Tl8.67Pb0.60Bi0.73Te6 at 575 K
Evolutionary nonnegative matrix factorization for data compression
This paper aims at improving non-negative matrix factor- ization (NMF) to facilitate data compression. An evolutionary updat- ing strategy is proposed to solve the NMF problem iteratively based on three sets of updating rules including multiplicative, firefly and sur- vival of the fittest rules. For data compression application, the quality of the factorized matrices can be evaluated by measurements such as spar- sity, orthogonality and factorization error to assess compression quality in terms of storage space consumption, redundancy in data matrix and data approximation accuracy. Thus, the fitness score function that drives the evolving procedure is designed as a composite score that takes into account all these measurements. A hybrid initialization scheme is per- formed to improve the rate of convergence, allowing multiple initial can- didates generated by different types of NMF initialization approaches. Effectiveness of the proposed method is demonstrated using Yale and ORL image datasets
Design of Machine Learning Algorithms with Applications to Breast Cancer Detection
Machine learning is concerned with the design and development of algorithms and
techniques that allow computers to 'learn' from experience with respect to some class
of tasks and performance measure. One application of machine learning is to improve
the accuracy and efficiency of computer-aided diagnosis systems to assist physician,
radiologists, cardiologists, neuroscientists, and health-care technologists. This thesis
focuses on machine learning and the applications to breast cancer detection. Emphasis
is laid on preprocessing of features, pattern classification, and model selection.
Before the classification task, feature selection and feature transformation may be
performed to reduce the dimensionality of the features and to improve the classification
performance. Genetic algorithm (GA) can be employed for feature selection based
on different measures of data separability or the estimated risk of a chosen classifier.
A separate nonlinear transformation can be performed by applying kernel principal
component analysis and kernel partial least squares.
Different classifiers are proposed in this work: The SOM-RBF network combines
self-organizing maps (SOMs) and radial basis function (RBF) networks, with the RBF
centers set as the weight vectors of neurons from the competitive layer of a trained
SaM. The pairwise Rayleigh quotient (PRQ) classifier seeks one discriminating boundary
by maximizing an unconstrained optimization objective, named as the PRQ criterion,
formed with a set of pairwise const~aints instead of individual training samples.
The strict 2-surface proximal (S2SP) classifier seeks two proximal planes that are not
necessary parallel to fit the distribution of the samples in the original feature space or
a kernel-defined feature space, by ma-ximizing two strict optimization objectives with
a 'square of sum' optimization factor. Two variations of the support vector data description
(SVDD) with negative samples (NSVDD) are proposed by involving different
forms of slack vectors, which learn a closed spherically shaped boundary, named as the
supervised compact hypersphere (SCH), around a set of samples in the target class. \Ve
extend the NSVDDs to solve the multi-class classification problems based on distances
between the samples and the centers of the learned SCHs in a kernel-defined feature
space, using a combination of linear discriminant analysis and the nearest-neighbor rule.
The problem of model selection is studied to pick the best values of the hyperparameters
for a parametric classifier. To choose the optimal kernel or regularization
parameters of a classifier, we investigate different criteria, such as the validation error
estimate and the leave-out-out bound, as well as different optimization methods, such
as grid search, gradient descent, and GA. By viewing the tuning problem of the multiple
parameters of an 2-norm support vector machine (SVM) as an identification problem
of a nonlinear dynamic system, we design a tuning system by employing the extended
Kalman filter based on cross validation. Independent kernel optimization based on
different measures of data separability are a~so investigated for different kernel-based
classifiers.
Numerous computer experiments using the benchmark datasets verify the theoretical
results, make comparisons among the techniques in measures of classification
accuracy or area under the receiver operating characteristics curve. Computational
requirements, such as the computing time and the number of hyper-parameters, are
also discussed.
All of the presented methods are applied to breast cancer detection from fine-needle
aspiration and in mammograms, as well as screening of knee-joint vibroarthrographic
signals and automatic monitoring of roller bearings with vibration signals. Experimental
results demonstrate the excellence of these methods with improved classification
performance.
For breast cancer detection, instead of only providing a binary diagnostic decision
of 'malignant' or 'benign', we propose methods to assign a measure of confidence
of malignancy to an individual mass, by calculating probabilities of being benign and
malignant with a single classifier or a set of classifiers
Computation of Heterogeneous Object Co-embeddings from Relational Measurements
Dimensionality reduction and data embedding methods generate low dimensional representations of a single type of homogeneous data objects. In this work, we examine the problem of generating co-embeddings or pattern representations from two different types of objects within a joint common space of controlled dimensionality, where the only available information is assumed to be a set of pairwise relations or similarities between instances of the two groups. We propose a new method that models the embedding of each object type symmetrically to the other type, subject to flexible scale constraints and weighting parameters. The embedding generation relies on an efficient optimization dispatched using matrix decomposition, that is also extended to support multidimensional co-embeddings. We also propose a scheme of heuristically reducing the parameters of the model, and a simple way of measuring the conformity between the original object relations and the ones re-estimated from the co-embeddings, in order to achieve model selection by identifying the optimal model parameters with a simple search procedure. The capabilities of the proposed method are demonstrated with multiple synthetic and real-world datasets from the text mining domain. The experimental results and comparative analyses indicate that the proposed algorithm outperforms existing methods for co-embedding generation
- …